Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Quantization of Neural Networks

TABLE 2.2

Evaluating the components of Q-DETR-R50 on the VOC dataset.

Method

#Bits

AP50 #Bits AP50 #Bits AP50

Real-valued

32-32-32

83.3

Baseline

4-4-8

78.0

3-3-8

76.8

2-2-8

69.7

+DA

4-4-8

78.8

3-3-8

78.0

2-2-8

71.6

+FQM

4-4-8

81.5

3-3-8

80.9

2-2-8

74.9

+DA+FQM (Q-DETR)

4-4-8

82.7

3-3-8

82.1

2-2-8

76.4

Note: #Bits (W-A-Attention) denotes the bit-width of weights, activations, and attention

activations. DA denotes the distribution alignment module. FQM denotes foreground-aware

query matching.

by 1.9%, and the FQM achieves a 5.2% performance improvement. While combining the

DA and FQM, the performance improvement achieves 6.7%.

Information analysis. We further show the information plane following [238] in

Fig. 2.12. We adopt the test AP50 to quantify I(y^GT; E, q). We employ a reconstruction

decoder to decode the encoded feature E to reconstruct the input and quantify I(X; E)

using the ℓ1 loss. As shown in Fig. 2.12, the curve of the larger teacher DETR-R101 is

usually on the right of the curve of small student models, which indicates a greater ability

of information representation. Likewise, the purple line (Q-DETR-R50) is usually on the

right of the three left curves, showing the information representation improvements with

the proposed methods.